{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 106 - Quantile Regression with LightGBM\n",
    "\n",
    "We will demonstrate how to use the LightGBM quantile regressor with\n",
    "TrainRegressor and ComputeModelStatistics on the Triazines dataset.\n",
    "\n",
    "\n",
    "This sample demonstrates how to use the following APIs:\n",
    "- [`TrainRegressor`\n",
    "  ](http://mmlspark.azureedge.net/docs/pyspark/TrainRegressor.html)\n",
    "- [`LightGBMRegressor`\n",
    "  ](http://mmlspark.azureedge.net/docs/pyspark/LightGBMRegressor.html)\n",
    "- [`ComputeModelStatistics`\n",
    "  ](http://mmlspark.azureedge.net/docs/pyspark/ComputeModelStatistics.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "triazines = spark.read.format(\"libsvm\")\\\n",
    "    .load(\"wasbs://publicwasb@mmlspark.blob.core.windows.net/triazines.scale.svmlight\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# print some basic info\n",
    "print(\"records read: \" + str(triazines.count()))\n",
    "print(\"Schema: \")\n",
    "triazines.printSchema()\n",
    "triazines.limit(10).toPandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Split the dataset into train and test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, test = triazines.randomSplit([0.85, 0.15], seed=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train the quantile regressor on the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mmlspark.lightgbm import LightGBMRegressor\n",
    "model = LightGBMRegressor(objective='quantile',\n",
    "                          alpha=0.2,\n",
    "                          learningRate=0.3,\n",
    "                          numLeaves=31).fit(train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can save and load LightGBM to a file using the LightGBM native representation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mmlspark.lightgbm import LightGBMRegressionModel\n",
    "model.saveNativeModel(\"mymodel\")\n",
    "model = LightGBMRegressionModel.loadNativeModelFromFile(\"mymodel\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "View the feature importances of the trained model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(model.getFeatureImportances())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Score the regressor on the test data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scoredData = model.transform(test)\n",
    "scoredData.limit(10).toPandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute metrics using ComputeModelStatistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mmlspark.train import ComputeModelStatistics\n",
    "metrics = ComputeModelStatistics(evaluationMetric='regression',\n",
    "                                 labelCol='label',\n",
    "                                 scoresCol='prediction') \\\n",
    "            .transform(scoredData)\n",
    "metrics.toPandas()"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
